Maximal frequent itemset generation using segmentation approach
نویسندگان
چکیده
Finding frequent itemsets in a data source is a fundamental operation behind Association Rule Mining. Generally, many algorithms use either the bottom-up or top-down approaches for finding these frequent itemsets. When the length of frequent itemsets to be found is large, the traditional algorithms find all the frequent itemsets from 1-length to n-length, which is a difficult process. This problem can be solved by mining only the Maximal Frequent Itemsets (MFS). Maximal Frequent Itemsets are frequent itemsets which have no proper frequent superset. Thus, the generation of only maximal frequent itemsets reduces the number of itemsets and also time needed for the generation of all frequent itemsets as each maximal itemset of length m implies the presence of 2 m -2 frequent itemsets. Furthermore, mining only maximal frequent itemset is sufficient in many data mining applications like minimal key discovery and theory extraction. In this paper, we suggest a novel method for finding the maximal frequent itemset from huge data sources using the concept of segmentation of data source and prioritization of segments. Empirical evaluation shows that this method outperforms various other known methods.
منابع مشابه
Ramp: Fast Frequent Itemset Mining with Efficient Bit-Vector Projection Technique
Mining frequent itemset using bit-vector representation approach is very efficient for dense type datasets, but highly inefficient for sparse datasets due to lack of any efficient bit-vector projection technique. In this paper we present a novel efficient bit-vector projection technique, for sparse and dense datasets. To check the efficiency of our bit-vector projection technique, we present a ...
متن کاملDiscovering Maximal Frequent Item set using Association Array and Depth First Search Procedure with Effective Pruning Mechanisms
The first step of association rule mining is finding out all frequent itemsets. Generation of reliable association rules are based on all frequent itemsets found in the first step. Obtaining all frequent itemsets in a large database leads the overall performance in the association rule mining. In this paper, an efficient method for discovering the maximal frequent itemsets is proposed. This met...
متن کاملTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
-Association rule has been an area of active research in the field of knowledge discovery. Data mining researchers had improved upon the quality of association rule mining for business development by incorporating influential factors like value (utility), quantity of items sold (weight) and more for the mining of association patterns. In this paper, we propose an efficient approach to find maxi...
متن کاملAn Algorithm for Mining Maximum Frequent Itemsets Using Data-sets Condensing and Intersection Pruning
Discovering maximal frequent itemset is a key issue in data mining; the Apriori-like algorithms use candidate itemsets generating/testing method, but this approach is highly time-consuming. To look for an algorithm that can avoid the generating of vast volume of candidate itemsets, nor the generating of frequent pattern tree, DCIP algorithm uses data-set condensing and intersection pruning to f...
متن کاملMaRFI: Maximal Regular Frequent Itemset Mining using a pair of Transaction-ids
Frequent pattern mining is the fundamental and most dominant research area in data mining. Maximal frequent patterns are one of the compact representations of frequent itemsets. There is more number of algorithms to find maximal frequent patterns that are suitable for mining transactional databases. Users not only interested in occurrence frequency but may be interested on frequent patterns tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1109.2427 شماره
صفحات -
تاریخ انتشار 2011